OBHave User Group
Most Recommendation Systems do not scale well, because they think users are entities instead of masses. That is a dangerous philosophy, because it includes the naive assumption that Machine Learning can understand human beings. Better assumption is that Machine Learning can augment human behavior to reach wanted goals better. Recommendation System should bring companies a good Return on Investment, thus best way to build such system is to make the Machine Learning components scalability relational to the quality of Recommendations: The business analysts should measure the value of the recommendations and compare them to the costs. OBHave is designed to minimize the required manual labor and provide effective monitoring and priorization tools for computing power and business opportunities. User Groups represent different behavior patterns, thus the business analysts can help the admins to prioritize how often certain Machine Learning tasks are performed for each kind of User Group. User Groups enable OBHave to be based on Black Box architecture, which minimizes the need for additional coding; it is not a Ferrari, but in my opinnion, it is better to have Black Box solution that needs computing power to generate better recommedation, instead of working hours of programmers, data scientists and business analysts. Formation And Membership of a User Group Based on the server architecture and computational power, there will be a target amount of User Groups, because most background Machine Learning operations depend on User Groups. Based on the nature of the site there will be different User Group formation strategies and how many User Groups each user has to belong per strategy. For example, we could have a system in which we would have "Opinnion Leader" strategy with three steps and each user to belong in two User Groups per strategy step. Opinnion Leader strategy analyses the Content Items (CI) consumed by the user and the User Group. I call the steps of three step "Opinnion Leader"-strategy "Classics", "Opinion Leaders" and "Experimentals". Experimental User Group selection strategy compares the two most recent items, Opinion Leader strategy compares 10 and Classic strategy compares 50 (rule of 20/80, 50 * 0.2 = 10, 10 * 0.2 = 2). When User Group performs badly, the likelihood of user being removed from the group grows and new group is found by best match for the current strategies and their weights, which are determined by the stress level of the Machine (caused by recommendations lack of success in finding CIs for the user to consume with least navigational effort). Formation of a new User Group is a "rare" event triggered by combination of Learning stress and growing user base. And of course, User Groups can also vanish, when Learning stress level is low and user activity is decreasing. Recommendation System and User Group Machine Learning Responsibilities When generating recommendations tailored for each user, the User Groups cast votes, which define the probabilities of certain content being recommended for the users. I used Representative Democracy as inspiration for my model: user behavior casts votes for representatives, who cast votes for recommendation decisions. User Groups will host various Rules and Content Item Sets. The behavior of individual users is computed as group behavior and those relations connect User Groups to Content Items. Content Items connect to Rules and Content Item Sets. Some Rules and Content Item Sets form connections between User Groups, some may even be global (content property related Rules are always global, because they don't track user behavior e.g. product categories of very common product groups). Because User Groups are "root of all evil" and OBHave utilizes a Graph database (at least for now), most Machine Learning and Recommendation queries start from User Groups. Observing the world from one point of view (or perspective if you will) is essential for reducing complexity and improving maintainability. In Graph database "joins" should be inexpensive and this should scale well. Scalability, Business Analysis and OBHave Core Design Principles If the service needs more returning users, it can be encouraged to provide interesting content items. If the service needs to make more sales, the system can focus on sales patterns. User Groups generate new rules and these rules can be learned by other User Groups (which affects how they Learn and Recommend). In many BigData projects these kind of opportunities require huge amount of number crunching to be discovered and additional programming to be exploited. OBHave is designed to naturally discover and exploit opportunities, when it has access to user events related to the opportunity. Netflix uses a White Box approach to improving their Recommendation System: Their developers come up with ideas, which they refine to hypothesis, then they code a solution, which is tested and iterated within off-line and on-line environments and then if it's performance (computational and qualital) is good enough, the solution is added to the pool of algorithms Netflix uses. OBHave uses a Black Box approach: You put in user data and learning goals and get working recommendations in return. The quality of the recommendation is not based on engineering team skills but the art of choosing user events and goals combined with the computational powers given to the OBHave server. I like this approach, because in my experience software projects become hard to maintain due to human added complexity; bad ideas that are hard to understand, naive hypothesis, bad code that is hard to understand, naive testing, mismatching environments and growing code base. Many technical innovations start from White Box methods: Newbies to TDD wrote huge amount of White Box tests, which became hard to maintain, because they didn't want to pay the learning cost involved with Black Box tests. OBHave is based on mathematical models that are taught at Universities right now. Some of the formulas are of my own creation, because maths do not always need to think scalability in similar manner as computer programs. As an example in Reinforcement Learning all strategies are evaluated, instead of sub-set that grows by the amount of iterations and lack of successfully finding best rewards. User Group oriented thinking enables design that adds probabilistic models to Machine Learning and Recommendation Systems. In practice this means that certain threshold of accuracy is accepted. The less your models have probabilistic parameters with thresholds, the more iterations you need for conclusions, because more parameters need to converge and be taken into account. The Machine Learning background process uses Message Queues and monitors the server performance. It has priorized Learning goals and creates new queue workers for the processes with weights of the goals. Each process handles certain kind of user groups or combinations of user groups or other tasks that are beyond user groups. This way the maximal potential of the processing power of the servers are utilized at all times, with certain safety thresholds for visitor peaks and other such issues configured by the admins. (Server Clusters should be designed in such way that OBHave has a dedicated server(s) which can have down time).